Pesquisa | Portal Regional da BVS

1.

Early detection of emerging viral variants through analysis of community structure of coordinated substitution networks.

Mohebbi, Fatemeh; Zelikovsky, Alex; Mangul, Serghei; Chowell, Gerardo; Skums, Pavel.

Nat Commun ; 15(1): 2838, 2024 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-38565543

RESUMO

The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data.

Assuntos

Evolução Biológica , Genoma Viral , Filogenia , Diagnóstico Precoce , Genoma Viral/genética , Genômica , SARS-CoV-2/genética

2.

A rigorous benchmarking of alignment-based HLA typing algorithms for RNA-seq data.

Yu, Dottie; Ayyala, Ram; Sadek, Sarah Hany; Chittampalli, Likhitha; Farooq, Hafsa; Jung, Junghyun; Nahid, Abdullah Al; Boldirev, Grigore; Jung, Mina; Park, Sungmin; Nguyen, Austin; Zelikovsky, Alex; Mancuso, Nicholas; Joo, Jong Wha J; Thompson, Reid F; Alachkar, Houda; Mangul, Serghei.

bioRxiv ; 2024 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-38293199

RESUMO

Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.

3.

Reconstruction of Viral Variants via Monte Carlo Clustering.

Juyal, Akshay; Hosseini, Roya; Novikov, Daniel; Grinshpon, Mark; Zelikovsky, Alex.

J Comput Biol ; 30(9): 1009-1018, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37695837

RESUMO

Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.

Assuntos

COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Benchmarking , Análise por Conglomerados , Progressão da Doença

4.

Comparative transcriptome analysis of Peromyscus leucopus and C3H mice infected with the Lyme disease pathogen.

Gaber, Alhussien M; Mandric, Igor; Nitirahardjo, Caroline; Piontkivska, Helen; Hillhouse, Andrew E; Threadgill, David W; Zelikovsky, Alex; Rogovskyy, Artem S.

Front Cell Infect Microbiol ; 13: 1115350, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37113133

RESUMO

Lyme disease (LD), the most prevalent tick-borne disease of humans in the Northern Hemisphere, is caused by the spirochetal bacterium of Borreliella burgdorferi (Bb) sensu lato complex. In nature, Bb spirochetes are continuously transmitted between Ixodes ticks and mammalian or avian reservoir hosts. Peromyscus leucopus mice are considered the primary mammalian reservoir of Bb in the United States. Earlier studies demonstrated that experimentally infected P. leucopus mice do not develop disease. In contrast, C3H mice, a widely used laboratory strain of Mus musculus in the LD field, develop severe Lyme arthritis. To date, the exact tolerance mechanism of P. leucopus mice to Bb-induced infection remains unknown. To address this knowledge gap, the present study has compared spleen transcriptomes of P. leucopus and C3H/HeJ mice infected with Bb strain 297 with those of their respective uninfected controls. Overall, the data showed that the spleen transcriptome of Bb-infected P. leucopus mice was much more quiescent compared to that of the infected C3H mice. To date, the current investigation is one of the few that have examined the transcriptome response of natural reservoir hosts to Borreliella infection. Although the experimental design of this study significantly differed from those of two previous investigations, the collective results of the current and published studies have consistently demonstrated very limited transcriptomic responses of different reservoir hosts to the persistent infection of LD pathogens. Importance: The bacterium Borreliella burgdorferi (Bb) causes Lyme disease, which is one of the emerging and highly debilitating human diseases in countries of the Northern Hemisphere. In nature, Bb spirochetes are maintained between hard ticks of Ixodes spp. and mammals or birds. In the United States, the white-footed mouse, Peromyscus leucopus, is one of the main Bb reservoirs. In contrast to humans and laboratory mice (e.g., C3H mice), white-footed mice rarely develop clinical signs (disease) despite being (persistently) infected with Bb. How the white-footed mouse tolerates Bb infection is the question that the present study has attempted to address. Comparisons of genetic responses between Bb-infected and uninfected mice demonstrated that, during a long-term Bb infection, C3H mice reacted much stronger, whereas P. leucopus mice were relatively unresponsive.

Assuntos

Borrelia burgdorferi , Ixodes , Doença de Lyme , Animais , Camundongos , Humanos , Peromyscus/microbiologia , Transcriptoma , Camundongos Endogâmicos C3H , Reservatórios de Doenças , Doença de Lyme/microbiologia , Borrelia burgdorferi/genética , Ixodes/microbiologia , Perfilação da Expressão Gênica

5.

Identifying Biomarkers Using Support Vector Machine to Understand the Racial Disparity in Triple-Negative Breast Cancer.

Sahoo, Bikram; Pinnix, Zandra; Sims, Seth; Zelikovsky, Alex.

J Comput Biol ; 30(4): 502-517, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36716280

RESUMO

With the properties of aggressive cancer and heterogeneous tumor biology, triple-negative breast cancer (TNBC) is a type of breast cancer known for its poor clinical outcome. The lack of estrogen, progesterone, and human epidermal growth factor receptor in the tumors of TNBC leads to fewer treatment options in clinics. The incidence of TNBC is higher in African American (AA) women compared with European American (EA) women with worse clinical outcomes. The significant factors responsible for the racial disparity in TNBC are socioeconomic lifestyle and tumor biology. The current study considered the open-source gene expression data of triple-negative breast cancer samples' racial information. We implemented a state-of-the-art classification Support Vector Machine (SVM) method with a recurrent feature elimination approach to the gene expression data to identify significant biomarkers deregulated in AA women and EA women. We also included Spearman's rho and Ward's linkage method in our feature selection workflow. Our proposed method generates 24 features/genes that can classify the AA and EA samples 98% accurately. We also performed the Kaplan-Meier analysis and log-rank test on the 24 features/genes. We only discussed the correlation between deregulated expression and cancer progression with a poor survival rate of 2 genes, KLK10 and LRRC37A2, out of 24 genes. We believe that further improvement of our method with a higher number of RNA-seq gene expression data will more accurately provide insight into racial disparity in TNBC.

Assuntos

Disparidades nos Níveis de Saúde , Neoplasias de Mama Triplo Negativas , Feminino , Humanos , Biomarcadores Tumorais/genética , Negro ou Afro-Americano/genética , Máquina de Vetores de Suporte , Neoplasias de Mama Triplo Negativas/etnologia , Neoplasias de Mama Triplo Negativas/patologia , Brancos/genética

6.

Sensory Nerves Impede the Formation of Tertiary Lymphoid Structures and Development of Protective Antimelanoma Immune Responses.

Vats, Kavita; Kruglov, Oleg; Sahoo, Bikram; Soman, Vishal; Zhang, Jiying; Shurin, Galina V; Chandran, Uma R; Skums, Pavel; Shurin, Michael R; Zelikovsky, Alex; Storkus, Walter J; Bunimovich, Yuri L.

Cancer Immunol Res ; 10(9): 1141-1154, 2022 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-35834791

RESUMO

Peripheral neurons comprise a critical component of the tumor microenvironment (TME). The role of the autonomic innervation in cancer has been firmly established. However, the effect of the afferent (sensory) neurons on tumor progression remains unclear. Utilizing surgical and chemical skin sensory denervation methods, we showed that afferent neurons supported the growth of melanoma tumors in vivo and demonstrated that sensory innervation limited the activation of effective antitumor immune responses. Specifically, sensory ablation led to improved leukocyte recruitment into tumors, with decreased presence of lymphoid and myeloid immunosuppressive cells and increased activation of T-effector cells within the TME. Cutaneous sensory nerves hindered the maturation of intratumoral high endothelial venules and limited the formation of mature tertiary lymphoid-like structures containing organized clusters of CD4+ T cells and B cells. Denervation further increased T-cell clonality and expanded the B-cell repertoire in the TME. Importantly, CD8a depletion prevented denervation-dependent antitumor effects. Finally, we observed that gene signatures of inflammation and the content of neuron-associated transcripts inversely correlated in human primary cutaneous melanomas, with the latter representing a negative prognostic marker of patient overall survival. Our results suggest that tumor-associated sensory neurons negatively regulate the development of protective antitumor immune responses within the TME, thereby defining a novel target for therapeutic intervention in the melanoma setting.

Assuntos

Melanoma , Neoplasias Cutâneas , Estruturas Linfoides Terciárias , Humanos , Imunidade , Microambiente Tumoral

7.

Unlocking capacities of genomics for the COVID-19 response and future pandemics.

Knyazev, Sergey; Chhugani, Karishma; Sarwal, Varuni; Ayyala, Ram; Singh, Harman; Karthikeyan, Smruthi; Deshpande, Dhrithi; Baykal, Pelin Icer; Comarova, Zoia; Lu, Angela; Porozov, Yuri; Vasylyeva, Tetyana I; Wertheim, Joel O; Tierney, Braden T; Chiu, Charles Y; Sun, Ren; Wu, Aiping; Abedalthagafi, Malak S; Pak, Victoria M; Nagaraj, Shivashankar H; Smith, Adam L; Skums, Pavel; Pasaniuc, Bogdan; Komissarov, Andrey; Mason, Christopher E; Bortz, Eric; Lemey, Philippe; Kondrashov, Fyodor; Beerenwinkel, Niko; Lam, Tommy Tsan-Yuk; Wu, Nicholas C; Zelikovsky, Alex; Knight, Rob; Crandall, Keith A; Mangul, Serghei.

Nat Methods ; 19(4): 374-380, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35396471

Assuntos

COVID-19 , Pandemias , Genômica , Humanos , SARS-CoV-2/genética

8.

Computational Approaches to Detect Illicit Drug Ads and Find Vendor Communities Within Social Media Platforms.

Zhao, Fengpan; Skums, Pavel; Zelikovsky, Alex; Sevigny, Eric L; Swahn, Monica Haavisto; Strasser, Sheryl M; Huang, Yan; Wu, Yubao.

IEEE/ACM Trans Comput Biol Bioinform ; 19(1): 180-191, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-32149652

RESUMO

The opioid abuse epidemic represents a major public health threat to global populations. The role social media may play in facilitating illicit drug trade is largely unknown due to limited research. However, it is known that social media use among adults in the US is widespread, there is vast capability for online promotion of illegal drugs with delayed or limited deterrence of such messaging, and further, general commercial sale applications provide safeguards for transactions; however, they do not discriminate between legal and illegal sale transactions. These characteristics of the social media environment present challenges to surveillance which is needed for advancing knowledge of online drug markets and the role they play in the drug abuse and overdose deaths. In this paper, we present a computational framework developed to automatically detect illicit drug ads and communities of vendors. The SVM- and CNN- based methods for detecting illicit drug ads, and a matrix factorization based method for discovering overlapping communities have been extensively validated on the large dataset collected from Google+, Flickr and Tumblr. Pilot test results demonstrate that our computational methods can effectively identify illicit drug ads and detect vendor-community with accuracy. These methods hold promise to advance scientific knowledge surrounding the role social media may play in perpetuating the drug abuse epidemic.

Assuntos

Publicidade , Drogas Ilícitas , Mídias Sociais , Humanos , Projetos de Pesquisa

9.

From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.

Melnyk, Andrew; Mohebbi, Fatemeh; Knyazev, Sergey; Sahoo, Bikram; Hosseini, Roya; Skums, Pavel; Zelikovsky, Alex; Patterson, Murray.

J Comput Biol ; 28(11): 1113-1129, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34698508

RESUMO

The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Brasil , Bases de Dados Genéticas , Entropia , Humanos , Método de Monte Carlo , África do Sul , Reino Unido , Estados Unidos

10.

Scalable Reconstruction of SARS-CoV-2 Phylogeny with Recurrent Mutations.

Novikov, Daniel; Knyazev, Sergey; Grinshpon, Mark; Icer, Pelin; Skums, Pavel; Zelikovsky, Alex.

J Comput Biol ; 28(11): 1130-1141, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34698524

RESUMO

This article presents a novel scalable character-based phylogeny algorithm for dense viral sequencing data called SPHERE (Scalable PHylogEny with REcurrent mutations). The algorithm is based on an evolutionary model where recurrent mutations are allowed, but backward mutations are prohibited. The algorithm creates rooted character-based phylogeny trees, wherein all leaves and internal nodes are labeled by observed taxa. We show that SPHERE phylogeny is more stable than Nextstrain's, and that it accurately infers known transmission links from the early pandemic. SPHERE is a fast algorithm that can process >200,000 sequences in <2 hours, which offers a compact phylogenetic visualization of Global Initiative on Sharing All Influenza Data (GISAID).

Assuntos

Mutação , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19/transmissão , COVID-19/virologia , Bases de Dados Genéticas , Humanos

11.

Technology dictates algorithms: recent developments in read alignment.

Alser, Mohammed; Rotman, Jeremy; Deshpande, Dhrithi; Taraszka, Kodi; Shi, Huwenbo; Baykal, Pelin Icer; Yang, Harry Taegyun; Xue, Victor; Knyazev, Sergey; Singer, Benjamin D; Balliu, Brunilda; Koslicki, David; Skums, Pavel; Zelikovsky, Alex; Alkan, Can; Mutlu, Onur; Mangul, Serghei.

Genome Biol ; 22(1): 249, 2021 08 26.

Artigo em Inglês | MEDLINE | ID: mdl-34446078

RESUMO

Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

Assuntos

Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência , Genoma Humano , HIV/fisiologia , Humanos , Metagenômica , Sulfitos

12.

Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction.

Knyazev, Sergey; Tsyvina, Viachaslau; Shankar, Anupama; Melnyk, Andrew; Artyomenko, Alexander; Malygina, Tatiana; Porozov, Yuri B; Campbell, Ellsworth M; Switzer, William M; Skums, Pavel; Mangul, Serghei; Zelikovsky, Alex.

Nucleic Acids Res ; 49(17): e102, 2021 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-34214168

RESUMO

Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.

Assuntos

Algoritmos , Biologia Computacional/métodos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Infecções por Vírus de RNA/diagnóstico , Vírus de RNA/genética , COVID-19/diagnóstico , COVID-19/virologia , Frequência do Gene , Infecções por HIV/diagnóstico , Infecções por HIV/virologia , HIV-1/genética , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Infecções por Vírus de RNA/virologia , Reprodutibilidade dos Testes , SARS-CoV-2/genética , Sensibilidade e Especificidade

13.

Unlocking capacities of viral genomics for the COVID-19 pandemic response.

Knyazev, Sergey; Chhugani, Karishma; Sarwal, Varuni; Ayyala, Ram; Singh, Harman; Karthikeyan, Smruthi; Deshpande, Dhrithi; Comarova, Zoia; Lu, Angela; Porozov, Yuri; Wu, Aiping; Abedalthagafi, Malak S; Nagaraj, Shivashankar H; Smith, Adam L; Skums, Pavel; Ladner, Jason; Lam, Tommy Tsan-Yuk; Wu, Nicholas C; Zelikovsky, Alex; Knight, Rob; Crandall, Keith A; Mangul, Serghei.

ArXiv ; 2021 Apr 28.

Artigo em Inglês | MEDLINE | ID: mdl-33948451

RESUMO

More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encoded in these vast amounts of data requires substantial effort across the research and public health communities. Studies of SARS-CoV-2 genomes have been critical in tracking the spread of variants and understanding its epidemic dynamics, and may prove crucial for controlling future epidemics and alleviating significant public health burdens. Together, genomic data and bioinformatics methods enable broad-scale investigations of the spread of SARS-CoV-2 at the local, national, and global scales and allow researchers the ability to efficiently track the emergence of novel variants, reconstruct epidemic dynamics, and provide important insights into drug and vaccine development and disease control. Here, we discuss the tremendous opportunities that genomics offers to unlock the effective use of SARS-CoV-2 genomic data for efficient public health surveillance and guiding timely responses to COVID-19.

14.

Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections.

Icer Baykal, Pelin B; Lara, James; Khudyakov, Yury; Zelikovsky, Alex; Skums, Pavel.

Virus Evol ; 7(1): veaa103, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33505710

RESUMO

Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays.

15.

Using earth mover's distance for viral outbreak investigations.

Melnyk, Andrew; Knyazev, Sergey; Vannberg, Fredrik; Bunimovich, Leonid; Skums, Pavel; Zelikovsky, Alex.

BMC Genomics ; 21(Suppl 5): 582, 2020 Dec 16.

Artigo em Inglês | MEDLINE | ID: mdl-33327932

RESUMO

BACKGROUND: RNA viruses mutate at extremely high rates, forming an intra-host viral population of closely related variants, which allows them to evade the host's immune system and makes them particularly dangerous. Viral outbreaks pose a significant threat for public health, and, in order to deal with it, it is critical to infer transmission clusters, i.e., decide whether two viral samples belong to the same outbreak. Next-generation sequencing (NGS) can significantly help in tackling outbreak-related problems. While NGS data is first obtained as short reads, existing methods rely on assembled sequences. This requires reconstruction of the entire viral population, which is complicated, error-prone and time-consuming. RESULTS: The experimental validation using sequencing data from HCV outbreaks shows that the proposed algorithm can successfully identify genetic relatedness between viral populations, infer transmission direction, transmission clusters and outbreak sources, as well as decide whether the source is present in the sequenced outbreak sample and identify it. CONCLUSIONS: Introduced algorithm allows to cluster genetically related samples, infer transmission directions and predict sources of outbreaks. Validation on experimental data demonstrated that algorithm is able to reconstruct various transmission characteristics. Advantage of the method is the ability to bypass cumbersome read assembly, thus eliminating the chance to introduce new errors, and saving processing time by allowing to use raw NGS reads.

Assuntos

Hepacivirus , Vírus de RNA , Algoritmos , Surtos de Doenças , Hepacivirus/genética , Sequenciamento de Nucleotídeos em Larga Escala

16.

Analysis of heterogeneous genomic samples using image normalization and machine learning.

Basodi, Sunitha; Baykal, Pelin Icer; Zelikovsky, Alex; Skums, Pavel; Pan, Yi.

BMC Genomics ; 21(Suppl 6): 405, 2020 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-33349236

RESUMO

BACKGROUND: Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS: We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS: Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.

Assuntos

Genômica , Aprendizado de Máquina , Algoritmos , Análise por Conglomerados , Biologia Computacional , Humanos , Quase-Espécies

17.

Inference of mutability landscapes of tumors from single cell sequencing data.

Tsyvina, Viachaslau; Zelikovsky, Alex; Snir, Sagi; Skums, Pavel.

PLoS Comput Biol ; 16(11): e1008454, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-33253159

RESUMO

One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.

Assuntos

Mutação , Neoplasias/genética , Neoplasias/patologia , Análise de Célula Única/métodos , Algoritmos , Instabilidade Genômica , Humanos

18.

Author Correction: Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing.

Mandric, Igor; Rotman, Jeremy; Yang, Harry Taegyun; Strauli, Nicolas; Montoya, Dennis J; Van Der Wey, William; Ronas, Jiem R; Statz, Benjamin; Yao, Douglas; Petrova, Velislava; Zelikovsky, Alex; Spreafico, Roberto; Shifman, Sagiv; Zaitlen, Noah; Rossetti, Maura; Ansel, K Mark; Eskin, Eleazar; Mangul, Serghei.

Nat Commun ; 11(1): 4499, 2020 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-32887888

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

19.

Global transmission network of SARS-CoV-2: from outbreak to pandemic.

Skums, Pavel; Kirpich, Alexander; Baykal, Pelin Icer; Zelikovsky, Alex; Chowell, Gerardo.

medRxiv ; 2020 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-32511620

RESUMO

Background: The COVID-19 pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is straining health systems around the world. Although the Chinese government implemented a number of severe restrictions on people's movement in an attempt to contain its local and international spread, the virus had already reached many areas of the world in part due to its potent transmissibility and the fact that a substantial fraction of infected individuals develop little or no symptoms at all. Following its emergence, the virus started to generate sustained transmission in neighboring countries in Asia, Western Europe, Australia, Canada and the United States, and finally in South America and Africa. As the virus continues its global spread, a clear and evidence-based understanding of properties and dynamics of the global transmission network of SARS-CoV-2 is essential to design and put in place efficient and globally coordinated interventions. Methods: We employ molecular surveillance data of SARS-CoV-2 epidemics for inference and comprehensive analysis of its global transmission network before the pandemic declaration. Our goal was to characterize the spatial-temporal transmission pathways that led to the establishment of the pandemic. We exploited a network-based approach specifically tailored to emerging outbreak settings. Specifically, it traces the accumulation of mutations in viral genomic variants via mutation trees, which are then used to infer transmission networks, revealing an up-to-date picture of the spread of SARS-CoV-2 between and within countries and geographic regions. Results and Conclusions: The analysis suggest multiple introductions of SARS-CoV-2 into the majority of world regions by means of heterogeneous transmission pathways. The transmission network is scale-free, with a few genomic variants responsible for the majority of possible transmissions. The network structure is in line with the available temporal information represented by sample collection times and suggest the expected sampling time difference of few days between potential transmission pairs. The inferred network structural properties, transmission clusters and pathways and virus introduction routes emphasize the extent of the global epidemiological linkage and demonstrate the importance of internationally coordinated public health measures.

20.

Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing.

Mandric, Igor; Rotman, Jeremy; Yang, Harry Taegyun; Strauli, Nicolas; Montoya, Dennis J; Van Der Wey, William; Ronas, Jiem R; Statz, Benjamin; Yao, Douglas; Petrova, Velislava; Zelikovsky, Alex; Spreafico, Roberto; Shifman, Sagiv; Zaitlen, Noah; Rossetti, Maura; Ansel, K Mark; Eskin, Eleazar; Mangul, Serghei.

Nat Commun ; 11(1): 3126, 2020 06 19.

Artigo em Inglês | MEDLINE | ID: mdl-32561710

RESUMO

Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be cost-ineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.

Assuntos

Regiões Determinantes de Complementaridade/genética , Biologia Computacional/métodos , RNA-Seq , Conjuntos de Dados como Assunto , Estudos de Viabilidade , Humanos , Receptores de Antígenos de Linfócitos B/genética

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA